3D stereoscopic/multiview video processing system and its method

ABSTRACT

Disclosed is a stereoscopic/multiview three-dimensional video processing system and its method. In the present invention, stereoscopic/multiview three-dimensional video data having a plurality of images at the same time are coded into a plurality of elementary streams. The plural elementary streams output at the same time are multiplexed according to the user&#39;s selected display mode to generate a single elementary stream. After packetization of the single elementary stream continuously generated, information about the stereoscopic/multiview three-dimensional video multiplexing method and the selected display mode information are added to the packet header of the stream. Then the packetized elementary stream is sent to the image reproducer or stored in storage media. The present invention multiplexes the multi-channel elementary streams having the same temporal and spatial information, thereby minimizing the overlapping header information, and performs streaming of data suitable for the user&#39;s demand and the user system environments.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to a three-dimensional (3D) videoprocessing system and its method. More specifically, the presentinvention relates to an apparatus and method for processingstereoscopic/multiview three-dimensional video images based on MPEG-4(Motion Picture Experts Group-4).

[0003] 2. Description of the Related Art

[0004] MPEG is an information transmission method through video imagecompression and code representation and has been developed to thenext-generation compression method, MPEG-7, subsequent to the currentMPEG-1/2/4.

[0005] MPEG-4, i.e., the video streaming standard for freely storingmultimedia data including video images in digital storage media on theInternet is now in common use and is applicable to a portable webcastingMPEG-4 player (PWMP), etc.

[0006] More specifically, MPEG-4 is the standard for general multimediaincluding still pictures, computer graphics (CG), audio coding ofanalytical composition systems, composite audio based on the musicalinstrument data interface (MIDI), and text, by adding compression codingof the existing video and audio signals.

[0007] Accordingly, the technology of synchronization among objects thatare different from one another in attributes as well as the objectdescriptor representation method for representing the attributes of theindividual objects and the scene description information representationmethod for representing the temporal and spatial correlations among theobjects is a matter of great importance.

[0008] In the MPEG-4 system, media objects are coded and transferred inthe form of an elementary stream (ES), which is characterized byvariables determining a maximum transmission rate on the network, QoS(Quality of Service) factors, and necessary decoder resources. Theindividual media object is composed of one elementary stream of aparticular coding method and is streamed through a hierarchy structure,which comprises a compression layer, a sync layer, and a delivery layer.

[0009] The MPEG-4 system packetizes the data stream output from aplurality of encoders per access unit (AU) to process objects ofdifferent attributes and freely represents the data stream using theobject descriptor information and the scene description information.

[0010] However, the existing MPEG-4 system standardizes onlytwo-dimensional (hereinafter referred to as “2D”) multimedia data andtherefore scarcely concerns the technology for processingstereoscopic/multiview 3D video data.

SUMMARY OF THE INVENTION

[0011] It is therefore an object of the present invention to processstereoscopic/multiview three-dimensional video data based on theexisting MPEG-4 standards.

[0012] It is another object of the present invention to minimize theoverlapping header information of packets by multiplexing multi-channelfield-based elementary streams having the same temporal and spatialinformation into a single elementary stream.

[0013] It is further another object of the present invention to selectdata suitable for the user's demand and the user system environments,thereby facilitating the data stream.

[0014] In one aspect of the present invention, there is provided astereoscopic/multiview three-dimensional video processing system, whichis to process video images based on MPEG-4, the system including: acompressor for processing input stereoscopic/multiview three-dimensionalvideo data to generate field-based elementary streams of multiplechannels, and outputting the multi-channel elementary streams into asingle integrated elementary stream; a packetizer for receiving theelementary streams from the compressor per access unit and packetizingthe received elementary streams; and a transmitter for processing thepacketized stereoscopic/multiview three-dimensional video data andtransferring or storing the processed video data.

[0015] The compressor includes: a three-dimensional object encoder forcoding the input stereoscopic/multiview three-dimensional video data tooutput multi-channel field-based elementary streams; and athree-dimensional elementary stream mixer for integrating themulti-channel field-based elementary streams into a single elementarystream.

[0016] The three-dimensional object encoder outputs elementary streamsin the unit of 4-channel fields including odd and even fields of a leftimage and odd and even fields of a right image, when the input data arethree-dimensional stereoscopic video data. Alternatively, thethree-dimensional object encoder outputs N×2 field-based elementarystreams to the three-dimensional elementary stream mixer, when the inputdata are N-view multiview video data.

[0017] The three-dimensional elementary stream mixer generates a singleelementary stream by selectively using a plurality of elementary streamsinput through multiple channels according to a display mode forstereoscopic/multiview three-dimensional video data selected by a user.The display mode is any one mode selected from a two-dimensional videodisplay mode, a three-dimensional video field shuttering display modefor displaying three-dimensional video images by field-based shuttering,a three-dimensional stereoscopic video frame shuttering display mode fordisplaying three-dimensional video images by frame-based shuttering, anda multiview three-dimensional video display mode for sequentiallydisplaying images at a required frame rate.

[0018] The three-dimensional elementary stream mixer multiplexes4-channel field-based elementary streams of stereoscopicthree-dimensional video data output from the three-dimensional objectencoder into a single-channel access unit stream using 2-channelelementary streams in the order of the odd field elementary stream of aleft image and the even field elementary stream of a right image, whenthe display mode is the three-dimensional video field shuttering displaymode.

[0019] The three-dimensional elementary stream mixer multiplexes4-channel field-based elementary streams of stereoscopicthree-dimensional video output from the three-dimensional object encoderinto a single-channel access unit stream using 4-channel elementarystreams in the order of the odd field elementary stream of a left image,the even field elementary stream of the left image, the odd fieldelementary stream of a right image, and the even field elementary streamof the right image, when the display mode is the three-dimensional videoframe shuttering display mode.

[0020] The three-dimensional elementary stream mixer multiplexes4-channel field-based elementary streams of stereoscopicthree-dimensional video output from the three-dimensional object encoderinto a single-channel access unit stream using 2-channel elementarystreams in the order of the odd field elementary stream of a left imageand the even field elementary stream of the left image, when the displaymode is the two-dimensional video display mode.

[0021] The three-dimensional elementary stream mixer multiplexes N×2field-based elementary streams of N-view video output from thethree-dimensional object encoder into a single-channel access unitstream sequentially using the individual viewpoints in the order of oddfield elementary streams and even field elementary streams byviewpoints, when the display mode is the three-dimensional multiviewvideo display mode.

[0022] When processing the elementary streams into a single-channelaccess unit stream and sending them to the packetizer, the compressorsends the individual elementary stream to the packetizer by adding atleast one of image discrimination information representing whether theelementary stream is two- or three-dimensional video data, displaydiscrimination information representing the display mode of thestereoscopic/multiview three-dimensional video selected by a user, andviewpoint information representing the number of viewpoints of acorresponding video image that is a multiview video image.

[0023] Hence, the packetizer receives a single-channel stream from thecompressor per access unit, packetizes the received single-channelstream, and then constructs a packet header based on the additionalinformation. Preferably, the packet header includes an access unit startflag representing which byte of a packet payload is the start of thestream, an access unit end flag representing which byte of the packetpayload is the end of the stream, an image discrimination flagrepresenting whether the elementary stream output from the compressor istwo- or three-dimensional video data, a decoding time stamp flag, acomposition time stamp flag, a viewpoint information flag representingthe number of viewpoints of the video image, and a displaydiscrimination flag representing the display mode.

[0024] In another aspect of the present invention, there is provided astereoscopic/multiview three-dimensional video processing method thatincludes: (a) receiving three-dimensional video data, determiningwhether a corresponding video image is a stereoscopic or multiview videoimage, and processing the corresponding video data according to thedetermination result to generate multi-channel field-based elementarystreams; (b) multiplexing the multi-channel field-based elementarystreams in a display mode selected by a user to output a single-channelelementary stream; (c) packetizing the single-channel elementary streamreceived; and (d) processing the packetized stereoscopic/multiviewthree-dimensional video image and sending or storing the processed videoimage.

[0025] The step (a) of generating the elementary streams includes:outputting elementary streams in the unit of 4-channel fields includingodd and even fields of a left three-dimensional stereoscopic image andodd and even fields of a right three-dimensional stereoscopic image,when the input data are three-dimensional stereoscopic video data; andoutputting N×2 field-based elementary streams, when the input data areN-view multiview video data.

[0026] The multiplexing step (b) further includes multiplexing 4-channelfield-based elementary streams of stereoscopic three-dimensional videointo a single-channel access unit stream using 2-channel elementarystreams in the order of the odd field elementary streams of a left imageand the even field elementary streams of a right image, when the displaymode is a three-dimensional video field shuttering display mode.

[0027] The multiplexing step (b) further includes multiplexing 4-channelfield-based elementary streams of stereoscopic three-dimensional videointo a single-channel access unit stream using 4-channel elementarystreams in the order of the odd field elementary stream of a left image,the even field elementary stream of the left image, the odd fieldelementary stream of a right image and the even field elementary streamof the right image, when the display mode is a three-dimensional videoframe shuttering display mode.

[0028] The multiplexing step (b) further includes multiplexing 4-channelfield-based elementary streams of stereoscopic three-dimensional videointo a single-channel access unit stream using 2-channel elementarystreams in the order of the odd field elementary stream of a left imageand the even field elementary stream of the left image, when the displaymode is a two-dimensional video display mode.

[0029] The multiplexing step (b) further includes multiplexing N×2field-based elementary streams of N-view video into a single-channelaccess unit stream sequentially using the individual viewpoints in theorder of odd field elementary streams and even field elementary streamsby viewpoints, when the display mode is a three-dimensional multiviewvideo display mode.

[0030] The multiplexing step (b) includes: processing multiviewthree-dimensional video images to generate multi-channel elementarystreams and using time information acquired from an elementary stream ofone channel among the multi-channel elementary streams to acquiresynchronization with elementary streams of the other viewpoints, therebyacquiring synchronization among the three-dimensional video images.

BRIEF DESCRIPTION OF THE DRAWINGS

[0031] The accompanying drawings, which are incorporated in andconstitute a part of the specification, illustrate an embodiment of theinvention, and, together with the description, serve to explain theprinciples of the invention:

[0032]FIG. 1 is a schematic of a stereoscopic/multiview 3D videoprocessing system according to an embodiment of the present invention;

[0033]FIG. 2 is an illustration of information transmitted by ESI forthe conventional 2D multimedia;

[0034]FIG. 3 is an illustration of input/output data of a stereoscopic3D video encoder according to an embodiment of the present invention;

[0035]FIG. 4 is an illustration of input/output data of a 3D N-viewvideo encoder according to an embodiment of the present invention;

[0036]FIG. 5 is an illustration of input/output data of a 3D ES mixerfor stereoscopic video according to an embodiment of the presentinvention;

[0037]FIG. 6 is an illustration of input/output data of a multi-view 3DES mixer according to an embodiment of the present invention;

[0038]FIG. 7 is a schematic of a field-based ES multiplexer forstereoscopic 3D video images for field shuttering display according toan embodiment of the present invention;

[0039]FIG. 8 is a schematic of a field-based ES multiplexer forstereoscopic 3D video images for frame shuttering display according toan embodiment of the present invention;

[0040]FIG. 9 is a schematic of a field-based ES multiplexer forstereoscopic 3D video images for 2D display according to an embodimentof the present invention;

[0041]FIG. 10 is a schematic of a field-based ES multiplexer formultiview 3D video images for 3D display according to an embodiment ofthe present invention;

[0042]FIG. 11 is a schematic of a field-based ES multiplexer formultiview 3D video images for 2D display according to an embodiment ofthe present invention;

[0043]FIG. 12 is an illustration of additional transfer information forthe conventional ESI for processing stereoscopic/multiview 3D videoimages according to an embodiment of the present invention;

[0044]FIG. 13 is a schematic of a sync packet header for processingstereoscopic/multiview 3D video images according to an embodiment of thepresent invention;

[0045]FIG. 14 MPEG-4 is stream types defined by a system; and

[0046]FIG. 15 is a 3D video image stream type for processing astereoscopic/multiview 3D video image by a decoder.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0047] In the following detailed description, only the preferredembodiment of the invention has been shown and described, simply by wayof illustration of the best mode contemplated by the inventor(s) ofcarrying out the invention. As will be realized, the invention iscapable of modification in various obvious respects, all withoutdeparting from the invention. Accordingly, the drawings and descriptionare to be regarded as illustrative in nature, and not restrictive.

[0048] In the embodiment of the present invention, MPEG-4stereoscopic/multiview 3D video data are processed. Particularly, theencoded field-based elementary streams output through multiple channelsat the same time are integrated into a single-channel elementary streamaccording to the user's system environments and the user's selecteddisplay mode, and then multiplexed into a single 3D access unit stream(hereinafter referred to as “3D_AU stream”).

[0049] More particularly, the streaming is enabled to support all thefour display modes: a two-dimensional video display mode, athree-dimensional video field shuttering display mode for displayingthree-dimensional video images by field-based shuttering, athree-dimensional stereoscopic video frame shuttering display mode fordisplaying three-dimensional video images by frame-based shuttering, anda multiview three-dimensional video display mode for sequentiallydisplaying images at a required frame rate by using a lenticula lens orthe like.

[0050] To enable the multiplexing of the stereoscopic/multiview 3D videoimages and the above-mentioned four display defined by the user, theembodiment of the present invention generates new header information ofa sync packet header and constructs the header with the overlappinginformation minimized. Furthermore, the embodiment of the presentinvention simplifies synchronization among 3D video images by using thetime information acquired from one-channel elementary streams amongmulti-channel elementary streams for multiview video images at the sametime, to acquire synchronization with the elementary streams of theother viewpoints.

[0051]FIG. 1 is a schematic of a stereoscopic/multiview 3D videoprocessing system (hereinafter referred to as “video processing system”)according to an embodiment of the present invention.

[0052] The video processing system according to the embodiment of thepresent invention, which is to process stereoscopic/multiview 3D videodata based on the MPEG-4 system, comprises, as shown in FIG. 1, acompression layer 10 supporting multiple encoders; a sync layer 20receiving access unit (AU) data and generating packets suitable forsynchronization; and a delivery layer 30 including a FlexMux 31optionally given for simultaneous multiplexing of multiple streams, anda delivery multimedia integrated framework (DMIF) 32 for constructinginterfaces to transport environments and storage media.

[0053] The compression layer 10 comprises various object encoders forstill pictures, computer graphics (CG), audio coding of analyticalcomposition systems, musical instrument data interface (MIDI), and text,as well as 2D video and audio.

[0054] More specifically, the compression layer 10 comprises, as shownin FIG. 1, a 3D object encoder 11, a 2D object encoder 12, a scenedescription stream generator 13, a object descriptor stream generator14, and 3D elementary stream mixers (hereinafter referred to as “3D_ESmixers”) 15 and 16.

[0055] The 2D object encoder 12 encodes various objects including stillpictures, computer graphics (CG), audio coding of analytical compositionsystems, musical instrument data interface (MIDI), and text, as well as2D video and audio. The elementary stream output from the individualencoders in the 2D object encoder 12 is output in the form of an AUstream and is transferred to the sync layer 20.

[0056] The object descriptor stream generator 14 generates an objectdescriptor stream for representing the attributes of multiple objects,and the scene configuration information stream generator 13 generates ascene description stream for representing the temporal and spatialcorrelations among the objects.

[0057] The 3D object encoder 11 and the 3D_ES mixers 15 and 16 are toprocess stereoscopic/multiview 3D video images while maintainingcompatibility with the existing MPEG-4 system.

[0058] The 3D object encoder 11 is an object-based encoder forstereoscopic/multiview 3D video data, and comprises a plurality of 3Dreal image encoders for processing images actually taken by cameras orthe like, and a 3D computer graphic (CG) encoder for processingcomputer-generated images, i.e., CG.

[0059] When the input data are stereoscopic 3D video images generated indifferent directions, the 3D object encoder 11 outputs elementarystreams in the units of even and odd fields of left and right images,respectively. Contrarily, when the input data are N-view 3D videoimages, the 3D object encoder 11 outputs N×2 field-based elementarystreams to the 3D_ES mixers 15 and 16.

[0060] The 3D_ES mixers 15 and 16 process the individual elementarystreams output from the 3D object encoder 11 into a single 3D_AU stream,and send the single 3D_AU stream to the sync layer 20.

[0061] The above-stated single 3D_AU stream output from the compressionlayer 10 is transferred to the sync layer via an elementary streaminterface (ESI). The ESI is an interface connecting media data streamsto the sync layer that is not prescribed by the ISO/IEC 14496-1 but isprovided for easy realization, and accordingly, can be modified in caseof need. The ESI transfers SL packet header information. An example ofthe SL packet header information transferred through the ESI in theexisting MPEG-4 system is illustrated in FIG. 2. The SL packet headerinformation is used for the sync layer 20 generating an SL packetheader.

[0062] To maintain temporal synchronization between or in the elementarystreams, the sync layer 20 comprises a plurality of object packetizers21 for receiving the individual elementary stream output from thecompression layer 10 per AU, dividing it into a plurality of SL packetsto generate a payload of individual SL packets and to generate a headerof each individual SL packet with reference to information received forevery AU via the ESI, thereby completing SL packets composed of theheader and the payload.

[0063] The SL packet header is used to check continuity in case of dataloss and includes information related to a time stamp.

[0064] The packet stream output from the sync layer 20 is sent to thedelivery layer 30, and is processed into a stream suitable forinterfaces to transport environments and storage media via the DIMF 32after being multiplexed by the FlexMux 31.

[0065] The basic processing of the sync layer 20 and the delivery layer30 is the same as that of the existing MPEG-4 system, and will not bedescribed in detail.

[0066] Now, a description will be given as to a method for multiplexingstereoscopic/multiview 3D video images based on the above-constructedvideo processing system.

[0067] As an example, 2D images and multi-channel 3D images (includingstill or motion pictures) taken by at least two cameras, orcomputer-generated 3D images, i.e., CG, are fed into the 2D objectencoder 12 and the 3D object encoder 11 of the compression layer 10,respectively. The multiplexing process for 2D images is well known tothose skilled in the art and will not be described in detail.

[0068] The stereoscopic/multiview 3D video images that are real imagestaken by cameras are input to a 3D real image encoder 11 of the 3Dobject encoder 11, and the CG as a computer-generated 3Dstereoscopic/multiview video image is input to a 3D CG encoder 112 ofthe 3D object encoder 11.

[0069]FIGS. 3 and 4 illustrate the operations of the plural 3D realimage encoders and the 3D CG encoder, respectively.

[0070] When the input data are a stereoscopic 3D video image generatedin the left and right directions, as shown in FIG. 3, the 3D real imageencoder 111 or the 3D CG encoder 112 encodes left and right images orleft and right CG data in the unit of fields to output elementarystreams in the unit of 4-channel fields.

[0071] More specifically, the stereoscopic 3D real image or CG isencoded into a stereoscopic 3D elementary stream of left odd fields3DES_LO, a stereoscopic 3D elementary stream of left even fields3DES_LE, a stereoscopic 3D elementary stream of right odd fields3DES_RO, and a stereoscopic 3D elementary stream of right even fields3DES_RE.

[0072] When the input data are an N-view video image, the 3D real imageencoder 11 or the 3D CG encoder 112 encodes N-view image or CG data inthe unit of fields to output odd field elementary streams of first toN-th viewpoints, and even field elementary streams of first to N-thviewpoints.

[0073] More specifically, as shown in FIG. 4, the N-view video isencoded into N×2 elementary streams including an odd field elementarystream of the first viewpoint 3DES_(—)#1 OddField, an odd fieldelementary stream of the second viewpoint 3DES_(—)#2 OddField, . . . ,an odd field elementary stream of the N-th viewpoint 3DES_#N OddField,an even field elementary stream of the first viewpoint 3DES_(—)#1EvenField, an even field elementary stream of the second viewpoint3DES_(—)#2 EvenField, . . . , and an even field elementary stream of theN-th viewpoint 3DES_#N EvenField.

[0074] As described above, the multi-channel field-based elementarystreams output from the stereoscopic/multiview 3D object encoder 11 areinput to the 3D_ES mixers 15 and 16 for multiplexing.

[0075]FIGS. 5 and 6 illustrate the multiplexing process of the 3D_ESmixers.

[0076] The 3D_ES mixers 15 and 16 multiplex the multi-channelfield-based elementary streams into a 3D_AU stream to output asingle-channel integrated stream. Here, the elementary stream data to betransferred are variable depending on the display mode. Accordingly,multiplexing is performed to transfer only the necessary elementarystreams for the individual display mode.

[0077] There are four display modes: a 2D video display mode, a 3D videofield shuttering display mode, a 3D video frame shuttering display mode,and a multiview 3D video display mode.

[0078] FIGS. 7 to 11 illustrate multiplexing examples for multi-channelfield-based elementary streams depending on the display mode concerned.FIGS. 7, 8, and 9 show multiplexing methods for stereoscopic 3D videodata, and FIGS. 10 and 11 show multiplexing method for multiview 3Dvideo data.

[0079] When the user selects the 3D video field shuttering display modefor stereoscopic 3D video data, the stereoscopic 3D elementary stream ofleft odd fields 3DES_LO and the stereoscopic 3D elementary stream ofright even fields 3DES_RE among the 4-channel elementary streams outputfrom the 3D object encoder 11 are sequentially integrated into asingle-channel 3D_AU stream, as shown in FIG. 7.

[0080] When the user selects the 3D video frame shuttering display modefor stereoscopic 3D video data, the stereoscopic 3D elementary stream ofleft odd fields 3DES_LO, the stereoscopic 3D elementary stream of lefteven fields 3DES_LE, the stereoscopic 3D elementary stream of right oddfields 3DES_RO, and the stereoscopic 3D elementary stream of right evenfields 3DES_RE among the 4-channel elementary streams are sequentiallyintegrated into a single-channel 3D_AU stream, as shown in FIG. 8.

[0081] When the user selects the 2D video display mode for stereoscopic3D video data, the stereoscopic 3D elementary stream of left odd fields3DES_LO and the stereoscopic 3D elementary stream of left even fields3DES_LE are sequentially integrated into a single-channel 3D_AU stream,as shown in FIG. 9.

[0082] When the user selects the 3D video display mode for multiview 3Dvideo data, the elementary streams are integrated into a single-channel3D AU stream in the order of odd and even fields for every viewpoint andthen in the order of viewpoints, as shown in FIG. 10. Namely, theelementary streams of a multiview video image are integrated into asingle-channel 3D_AU stream in the order of the odd field elementarystream of the first viewpoint 3DES_(—)#1 OddField, the even fieldelementary stream of the first viewpoint 3DES_(—)#1 EvenField, . . . ,the odd field elementary stream of the N-th viewpoint 3DES_#N OddField,and the even field elementary stream of the N-th viewpoint 3DES_#NEvenField.

[0083] When the user selects the 2D video display mode for multiview 3Dvideo data, only the odd and even field elementary streams of oneviewpoint are sequentially integrated into a single-channel 3D_AUstream, as shown in FIG. 11. Accordingly, the user is enabled to displayimages of his/her desired viewpoint in the 2D video display mode formultiview 3D video images.

[0084] As described above, the single-channel 3D_AU stream output fromthe 3D_ES mixers 15 and 16 are fed into the sync layer 20. In additionto the information transferred from the ESI, as shown in FIG. 2, thesingle channel 3D_AU stream includes optional information forstereoscopic/multiview 3D video streaming according to the embodiment ofthe present invention.

[0085] The syntax and semantics of the information added to thestereoscopic/multiview 3D video data are defined in FIG. 12.

[0086]FIG. 12 shows the syntax and semantics of the information added tothe single 3D_AU stream for stereoscopic/multiview 3D video images,where only the optional information other than the informationtransferred via the ESI is illustrated.

[0087] More specifically, three information sets such as a displaydiscrimination flag 2D_(—)3DDispFlag, and a viewpoint information flagNumViewpoint are additionally given, as shown in FIG. 12.

[0088] The display discrimination flag 2D_(—)3DDispFlag represents thedisplay mode for stereoscopic/multiview 3D video chosen by the user. Inthis embodiment, the display discrimination flag is, if not specificallylimited to, “00” for the 2D video display mode, “01” for the 3D videofield shuttering display mode, “10” for the 3D video frame shutteringdisplay mode, and “11” for the multiview 3D video display mode.

[0089] The viewpoint information flag NumViewpoint represents the numberof viewpoints for motion pictures. Namely, the viewpoint informationflag is designated as “2” for stereoscopic 3D video data that are videoimages of two viewpoints, and as “N” for 3D N-view video data that arevideo images of N viewpoints.

[0090] The sync layer 20 receives the input elementary streams per AU,divides it into a plurality of SL packets to generate a payload of theindividual SL packets and constructs a sync packet header based on theinformation transferred via the ESI for every AU, and the above-statedadditional information for stereoscopic/multiview 3D video images (i.e.,display discrimination flag, and viewpoint information flag).

[0091]FIG. 13 illustrates the structure of a sync packet header that isheader information added to one 3D_AU stream for stereoscopic 3D videodata according to an embodiment of the present invention.

[0092] In the sync packet header shown in FIG. 13, an access unit startflag AccessUnitStartFlag represents which byte of the sync packetpayload is the start of the 3D_AU stream. For example, the flag bit of“1” means that the first byte of the SL packet payload is the start ofone 3D_AU stream.

[0093] An access unit end flag AccessUnitEndFlag represents which byteof the sync packet payload is the end of the 3D_AU stream. For example,the flag bit of “1” means that the last byte of the SL packet payload isthe ending byte of the current 3D_AU stream.

[0094] An object clock reference (OCR) flag represents how many objectclock references follow. For example, the flag bit of “1” means that oneobject clock reference follows.

[0095] An idle flag IdleFlag represents the output state of the 3D_AUstream. For example, the flag bit of “1” means that 3D_AU data are notoutput for a predetermined time, and the flag bit of “0” means that3D_AU data are output.

[0096] A padding flag PaddingFlag represents whether or not padding ispresent in the SL packet. For example, the flag bit of “1” means thatpadding is present in the SL packet.

[0097] The padding bit PaddingBits represents a padding mode to be usedfor the SL packet and has a default value of “0”.

[0098] A packet sequence number PacketSequenceNumber has a modulo valuecontinuously increasing for the individual SL packet. Discontinuity inthe decoder means a loss of at least one SL packet.

[0099] The object clock reference (OCR) includes an OCR time stamp andexists in the SL packet header only when the OCR flag is set.

[0100] The flag bit of the access unit start flag AccessUnitStartFlagset to “1” represents that the first byte of the SL packet payload isthe start of one 3D_AU, in which case information of the optional fieldsis transferred.

[0101] A random access point flag RandomAccessPointFlag having a flagbit set to “1” represents that random access to contents is enabled.

[0102] A 3D_AU sequence number 3D_AUSequenceNumber has a module valuecontinuously increasing for the individual 3D_AU. Discontinuity in thedecoder means a loss of at least one 3D_AU.

[0103] A decoding time stamp flag DecodingTimeStampFlag represents thepresence of a decoding time stamp (DTS) in the SL packet.

[0104] A composition time stamp flag CompositionTimeStampFlag representsthe presence of a composition time stamp (CTS) in the SL packet.

[0105] An instant bit rate flag InstantBitRateFlag represents thepresence of an instant bit rate in the SL packet.

[0106] A decoding time stamp (DTS) is a DTS present in the related SLconfiguration descriptor and exists only when the decoding time differsfrom the composition time for the 3D_AU.

[0107] A composition time stamp (CTS) is a CTS present in the related SIconfiguration descriptor.

[0108] A 3D_AU length represents the byte length of the 3D_AU.

[0109] An instant bit rate represents the bit rate for the current3D_AU, and is effective until the next instant bit rate field appears.

[0110] A degradation priority represents the priority of the SL packetpayload.

[0111] A viewpoint information flag NumViewpoint represents the numberof viewpoints of motion pictures. Namely, the viewpoint information flagis set to “2” for stereoscopic 3D video data that are motion pictures oftwo viewpoints; or the viewpoint information flag is set to “N” for 3DN-view video data.

[0112] A display discrimination flag 2D_(—)3DDispFlag represents thedisplay mode for 3D video data in the same manner as the case ofstereoscopic 3D video data. In this embodiment, the displaydiscrimination flag is set to “00” for the 2D video display mode, “01”for the 3D video field shuttering display mode, “10” for the 3D videoframe shuttering display mode and “11” for the multiview video displaymode.

[0113] Once the above-constructed header is built, the sync layer 20combines the header with the payload to generate an SL packet and sendsthe SL packet to the delivery layer 30.

[0114] After being multiplexed at the FlexMux 31, the SL packet streamtransferred to the delivery layer 30 is processed into a stream suitablefor an interface to transport environments via the DIMF 32 and sent to areceiver. Alternatively, the SL packet stream is processed into a streamsuitable for an interface to storage media and is stored in the storagemedia.

[0115] The receiver decodes the processed packet stream from the videoprocessing system to reproduce the original image.

[0116] In this case, the 3D object decoder at the receiver detects thestream format type of the multiplexed 3D_AU so as to restore the 3Dvideo data in the stream format type of each 3D-AU multiplexed. Thus the3D object decoder performs decoding after detecting the stream formattype of the 3D_AU based on the values stored in the viewpointinformation flag NumViewpoint and the display discrimination flag2D_(—)3DDispFlag among the information stored in the header of thepacket received.

[0117] For example, when the viewpoint information flag NumViewpoint is“2” and the display discrimination flag 2D_(—)3DDispFlag is “00” in theheader of the transferred packet stream, stereoscopic 3D video data areto be displayed in the 2D video display mode and the 3D_AU ismultiplexed in the order of the 3D elementary stream of left odd fields3DES_LO and the 3D elementary stream of left even fields 3DES_LE, asshown in FIG. 10.

[0118] When the viewpoint information flag NumViewpoint is “2” and thedisplay discrimination flag 2D_(—)3DDispFlag is “01”, stereoscopic 3Dvideo data are to be displayed in the 3D video field shuttering displaymode and the 3D_AU is multiplexed in the order of the 3D elementarystream of left odd fields 3DES_LO and the 3D elementary stream of righteven fields 3DES_RE, as shown in FIG. 8.

[0119] Finally, when the viewpoint information flag NumViewpoint is “2”and the display discrimination flag 2D_(—)3DDispFlag is “10”,stereoscopic 3D video data are to be displayed in the 3D video frameshuttering display mode and the 3D_AU is multiplexed in the order of the3D elementary stream of left odd fields 3DES_LO, the 3D elementarystream of left even fields 3DES_LE, and the 3D elementary stream ofright even fields 3DES_RE, as shown in FIG. 9.

[0120] On the other hand, when the viewpoint information flagNumViewpoint is “2” and the display discrimination flag 2D_(—)3DDispFlagis “11”, stereoscopic 3D video data are to be displayed in the multiview3D video display mode, a case that cannot occur.

[0121] When the viewpoint information flag NumViewpoint is “N” and thedisplay discrimination flag 2D_(—)3DDispFlag is “00”, multiview 3D videodata are to be displayed in the 2D video display mode and the 3D_AU ismultiplexed in the order of the odd field elementary stream of the firstviewpoint 3DES_(—)#1O and the even field elementary stream of the firstviewpoint 3DES_(—)#1E, as shown in FIG. 12.

[0122] When the viewpoint information flag NumViewpoint is “N” and thedisplay discrimination flag 2D_(—)3DDispFlag is “11”, multiview 3D videodata are to be displayed in the multiview 3D video display mode and the3D_AU is multiplexed in the order of all odd field elementary streams ofthe first to N-th viewpoints 3DES_(—)#1O, . . . , and 3DES_#NO and alleven field elementary streams of the first to N-th viewpoints3DES_(—)#1E, . . . , and 3DES_#NE, as shown in FIG. 11.

[0123] When the viewpoint information flag NumViewpoint is “N” and thedisplay discrimination flag 2D_(—)3DDispFlag is “10” or “01”, multiview3D video data are to be displayed in the 3D video frame/field shutteringdisplay mode, a case that seldom occurs.

[0124] As stated above, the receiver checks the stream format type ofthe 3D_AU multiplexed in the packet stream based on the values stored inthe viewpoint information flag NumViewpoint and the displaydiscrimination flag 2D_(—)3DDispFlag of the header of the packet streamtransferred from the video processing system according to the embodimentof the present invention, and then performs decoding to reproduce 3Dvideo images.

[0125]FIG. 14 shows stream types defined by the DecoderConfigDescriptorof the MPEG-4 system, and FIG. 15 shows a new stream type fordetermining whether an elementary stream of the stereoscopic 3D videoimage output from the compression layer is 2D or 3D video image data.

[0126] While this invention has been described in connection with whatis presently considered to be the most practical and preferredembodiment, it is to be understood that the invention is not limited tothe disclosed embodiments, but, on the contrary, is intended to covervarious modifications and equivalent arrangements included within thespirit and scope of the appended claims.

[0127] As described above, the present invention enablesstereoscopic/multiview 3D video processing in the existing MPEG-4system.

[0128] Particularly, the multi-channel field-based elementary streamshaving the same temporal and spatial information are multiplexed into asingle elementary stream, thereby minimizing the overlapping headerinformation.

[0129] The present invention also simplifies synchronization among 3Dvideo data by using the time information acquired from the one-channelelementary stream among the multi-channel elementary streams formultiview video data at the same time in synchronization with elementarystreams of the other viewpoints.

[0130] Furthermore, the multiplexing structure and the headerconstruction of the present invention enable the user to selectivelydisplay stereoscopic/multiview 3D video data in the 3D video field/frameshuttering display mode, the multiview 3D video display mode, or the 2Dvideo display mode, while maintaining compatibility with the existing 2Dvideo processing system. Hence, the present invention can performstreaming of selected data suitable for the user's demand and systemenvironments.

What is claimed is:
 1. A stereoscopic/multiview three-dimensional videoprocessing system, which is based on MPEG-4, the system comprising: acompressor for processing input stereoscopic/multiview three-dimensionalvideo data to generate field-based elementary streams of multiplechannels, and outputting the multi-channel elementary streams into asingle integrated elementary stream; a packetizer for receiving theelementary streams from the compressor per access unit and packetizingthe received elementary streams; and a transmitter for processing thepacketized stereoscopic/multiview three-dimensional video data andtransferring or storing the processed video data.
 2. The system asclaimed in claim 1, wherein the compressor comprises: athree-dimensional object encoder for encoding the inputstereoscopic/multiview three-dimensional video data to outputmulti-channel field-based elementary streams; and a three-dimensionalelementary stream mixer for integrating the multi-channel field-basedelementary streams into a single elementary stream, and outputting thesame.
 3. The system as claimed in claim 2, wherein the three-dimensionalobject encoder outputs elementary streams in the unit of 4-channelfields including odd and even fields of a left three-dimensionalstereoscopic image and odd and even fields of a right three-dimensionalstereoscopic image, when the input data are three-dimensionalstereoscopic video data.
 4. The system as claimed in claim 2, whereinthe three-dimensional object encoder outputs N×2 field-based elementarystreams to the three-dimensional elementary stream mixer, when the inputdata are N-view's multiview video data.
 5. The system as claimed inclaim 2, wherein the compressor comprises: an object descriptor streamgenerator for generating an object descriptor stream for representingthe attributes of multiple multimedia objects; a scene descriptionstream generator for generating a scene description stream forrepresenting the temporal and spatial correlations among objects; and atwo-dimensional encoder for encoding 2-dimensional multimedia data. 6.The system as claimed in claim 2, wherein the three-dimensionalelementary stream mixer generates a single elementary stream byselectively using a plurality of elementary streams input throughmultiple channels according to a display mode for stereoscopic/multiviewthree-dimensional video selected by a user.
 7. The system as claimed inclaim 6, wherein the display mode is any one mode selected from atwo-dimensional video display mode, a three-dimensional video fieldshuttering display mode for displaying three-dimensional video images byfield-based shuttering, a three-dimensional stereoscopic video frameshuttering display mode for displaying three-dimensional video images byframe-based shuttering, and a multiview three-dimensional video displaymode for sequentially displaying images at a required frame rate.
 8. Thesystem as claimed in claim 6, wherein the three-dimensional elementarystream mixer multiplexes 4-channel field-based elementary streams ofstereoscopic three-dimensional video output from the three-dimensionalobject encoder into a single-channel access unit stream using 2-channelelementary streams in the order of the odd field elementary stream of aleft image and the even field elementary stream of a right image, whenthe display mode is the three-dimensional video field shuttering displaymode.
 9. The system as claimed in claim 6, wherein the three-dimensionalelementary stream mixer multiplexes 4-channel field-based elementarystreams of stereoscopic three-dimensional video output from thethree-dimensional object encoder into a single-channel access unitstream using 4-channel elementary streams in the order of the odd fieldelementary stream of a left image, the even field elementary stream ofthe left image, the odd field elementary stream of a right image, andthe even field elementary stream of the right image, when the displaymode is the three-dimensional video frame shuttering display mode. 10.The system as claimed in claim 6, wherein the three-dimensionalelementary stream mixer multiplexes 4-channel field-based elementarystreams of stereoscopic three-dimensional video output from thethree-dimensional object encoder into a single-channel access unitstream using 2-channel elementary streams in the order of the odd fieldelementary stream of a left image and the even field elementary streamof the left image, when the display mode is the two-dimensional videodisplay mode.
 11. The system as claimed in claim 6, wherein thethree-dimensional elementary stream mixer multiplexes N×2 field-basedelementary streams of N-view video output from the three-dimensionalobject encoder into a single-channel access unit stream sequentiallyusing the individual viewpoints in the order of odd field elementarystreams and even field elementary streams by viewpoints, when thedisplay mode is the three-dimensional multiview video display mode. 12.The system as claimed in claim 1, wherein when processing the elementarystreams into a single-channel access unit stream and sending them to thepacketizer, the compressor sends the individual elementary stream to thepacketizer by adding at least one of image discrimination informationrepresenting whether the elementary stream is display discriminationinformation representing the display mode of the stereoscopic/multiviewthree-dimensional video selected by a user, and viewpoint informationrepresenting the number of viewpoints of a corresponding video imagethat is a multiview video image.
 13. The system as claimed in claim 12,wherein the packetizer receives a single-channel stream from thecompressor per access unit, packetizes the received single-channelstream, and then constructs a packet header based on the additionalinformation, wherein the packet header includes an access unit startflag representing which byte of a packet payload is the start of thestream, an access unit end flag representing which byte of the packetpayload is the end of the stream, an image discrimination flagrepresenting whether the elementary stream output from the compressor istwo- or three-dimensional video data, a decoding time stamp flag, acomposition time stamp flag, a viewpoint information flag representingthe number of viewpoints of the video image, and a displaydiscrimination flag representing the display mode.
 14. Astereoscopic/multiview three-dimensional video processing method, whichis based on MPEG-4, the method comprising: (a) receivingthree-dimensional video data, determining whether a corresponding videoimage is a stereoscopic or multiview video image, and processing thecorresponding video data according to the determination result togenerate multi-channel field-based elementary streams; (b) multiplexingthe multi-channel field-based elementary streams in a display modeselected by a user to output a single-channel elementary stream; (c)packetizing the single-channel elementary stream received; and (d)processing the packetized stereoscopic/multiview three-dimensional videoimage and sending or storing the processed video image.
 15. The methodas claimed in claim 14, wherein the step (a) of generating theelementary streams comprises: outputting elementary streams in the unitof 4-channel fields including odd and even fields of a leftthree-dimensional stereoscopic image and odd and even fields of a rightthree-dimensional stereoscopic image, when the input data arethree-dimensional stereoscopic video data; and outputting N×2field-based elementary streams, when the input data are Nview'smultiview video data.
 16. The method as claimed in claim 15, wherein themultiplexing step (b) further comprises: multiplexing 4-channelfield-based elementary streams of stereoscopic three-dimensional videointo a single-channel access unit stream using 2-channel elementarystreams in the order of the odd field elementary streams of a left imageand the even field elementary streams of a right image, when the displaymode is a three-dimensional video field shuttering display mode.
 17. Themethod as claimed in claim 15, wherein the multiplexing step (b) furthercomprises: multiplexing 4-channel field-based elementary streams ofstereoscopic three-dimensional video into a single-channel access unitstream using 4-channel elementary streams in the order of the odd fieldelementary stream of a left image, the even field elementary stream ofthe left image, the odd field elementary stream of a right image and theeven field elementary stream of the right image, when the display modeis a three-dimensional video frame shuttering display mode.
 18. Themethod as claimed in claim 15, wherein the multiplexing step (b) furthercomprises: multiplexing 4-channel field-based elementary streams ofstereoscopic three-dimensional video into a single-channel access unitstream using 2-channel elementary streams in the order of the odd fieldelementary stream of a left image and the even field elementary streamof the left image, when the display mode is a two-dimensional videodisplay mode.
 19. The method as claimed in claim 15, wherein themultiplexing step (b) further comprises: multiplexing N×2 field-basedelementary streams of N-view video into a single-channel access unitstream sequentially using the individual viewpoints in the order of oddfield elementary streams and even field elementary streams byviewpoints, when the display mode is a three-dimensional multiview videodisplay mode.
 20. The method as claimed in claim 14, wherein themultiplexing step (b) comprises: processing multiview three-dimensionalvideo images to generate multi-channel elementary streams and using timeinformation acquired from an elementary stream of one channel among themulti-channel elementary streams to acquire synchronization withelementary streams of the other viewpoints, thereby acquiringsynchronization among the three-dimensional video images.
 21. The systemas claimed in claim 1, wherein the DecoderConfigDescriptor includes a 3Dvideo image stream type so as to process a stereoscopic/multiview 3Dvideo image.